Grow Data Skills

Carelon Global Solutions| AWS Data Engineer Interview Experience | 3 YOE

Grow Data Skills
05-Sep-2024
5 mins read

Round 1 - Technical

🔹 Write PySpark code to save a DataFrame to AWS S3 in Parquet format.

🔹 How do you overwrite a file stored in S3 using PySpark?

🔹 Explain versioning in S3.

🔹 Write an SQL query to generate the given output.

🔹 What are the steps to execute a Python file containing PySpark code on an AWS EC2 environment?

🔹 How do you copy a file from the local system to AWS S3 without using the upload feature of the S3 bucket?

🔹 If you execute the same query in Snowflake and Spark, which one takes less time?

🔹 In the above scenario, which one will be costlier?

🔹 Have you worked on any cost optimization methods while loading data into Snowflake from a data lake like S3?

Round 2 - Techno-Managerial

This round was conducted in two phases and aimed to assess my knowledge of Spark, AWS, Snowflake, Python, and SQL. The questions were scenario-specific, focusing on AWS data engineering services such as Glue, Lambda, EC2, S3, Redshift, and Athena. Key topics included:

🔹Deep dive into Spark's memory distribution for processing a 500GB file.

🔹Architecture-level questions.

🔹Writing an entire PySpark program from import statements to the stop statement.

🔹Testing SQL skills with window functions using `LAG`, `LEAD`, and `DENSE_RANK`.

🔹Questions on Spark optimization techniques.

Round 3 - HR Discussion

🔹Team Culture Discussion.

🔹Leaves and Holiday policies.

🔹Work culture discussion.

🔹Salary Negotiation.

🔹Variable component discussion.

I received the offer letter after the final HR round.😊